Privacy-Preserving Text Indexing for Search of Documents

نویسنده

  • GYULA SALLAI
چکیده

Protection of content of sensitive text documents is important in enterprise intranets. An index structure is needed to support efficient search and retrieval, but it can lead to information leakage; by statistical attacks an adversary can draw probabilistic inference about the contents of document collection. Zerr and others present a confidential index structure and the ranking of retrieved documents for the query, but only for singleterm queries. The solution proposed in the paper generalizes Zerr’s method by using an anonymization parameter and query-dependent anonymized inverse document frequency factors; thereby it provides better ranking and gives possibility of multi-term queries. Key-Words: information retrieval, confidential text indexing, inverted index, posting list, inverse document frequency, r-confidentiality, anonymization, ranking

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Privacy preserving document indexing infrastructure for a distributed environment

To carry out work assignments, small groups distributed within a larger enterprise or collaborative community often need to share documents among themselves while shielding those documents from others’ eyes. In this situation, users need an indexing facility that can quickly locate relevant documents that they are allowed to access, without (1) leaking information about the remaining documents,...

متن کامل

Spatio-textual Indexing for Geographical Search on the Web

Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This ...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012